Stats 101 Spring 2024 Midterm
Property report
Assignment instructions
Task
Imagine you are hired by a property management firm int he city of Toronto that wants to understand the what features of properties lead to higher rents. In particular, there are a number of features beyond square feet of an apartment rent that may lead to higher rents for the property management firm.
Specifically, the client asks you analyze a few specific things.
- Consider the distribution of rents and whether there are any outliers
- Think about what rental types are most common type of arrangements - bedrooms vs. building type
- Create a regression model to consider what factors are most important in predicting
Price
of an apartment - Consider what lurking variables you may be missing or might change your analysis if you had access to them.
Finally, your firm asks you to make some recommendations about future rental units your firm should acquire based on your analysis of all of the above points.
Specific requirements
- Save this document as a new document (Save As…) and rename it
Toronto rental report
. - Rename the title of your report to
Toronto rental report
- Delete the
Assignment instructions
,Task
,Specific requirements
andPoints of emphases
sections - Final report should be minimum 1200 words
- Maximum 6 graphs
- Graphs can be combined and only counted as one as long as they are the same type - a combined plot of histograms of predictor variables, for example
- Maximum 4 tables
- Suggested structure:
- Introduction
- Summary data
- Rent distribution
- Most common rental types
- Rent price model
- Regression model
- Regression diagnostics
- Interpreting coefficients
- Sample rentals
- Conclusion
The definition of the variables can be found here.
There will be a midterm check to make sure you are making good progress due Sunday, April 7th at 11:59:00 pm
You need to submit a relatively finished version for the first two sections (Introduction
and Summary data
) by the Sunday deadline. The grade will be pass/fail. The submitted check does not have to be in its final form (you can modify it later if you choose) or be polished (some shown code / text not formatted perfectly is OK).
However, both sections should be substantially complete - all code necessary to fulfill the requirements of the section included and all text necessary to meet the requirements written.
Final version will be due Sunday, April 14th at 11:59:00
Please make sure to submit both the .qmd and .html files
Points of emphasis
Format
- Your job as an analyst is to write a report analyzing the state of the property market in Toronto, and, in particular, the determinants of rental price. The report should be a polished document you are proud to present to a client. The writing should be business professional, the document neat and tidy.
This document is not a formal essay. Some of the differences between a business report and a formal essay are summarized in the table below (this table and more advice on writing business reports can be found here):
Substance
For the rent distribution analysis, you want to view both the simple distribution of rents (looking for any outliers) and also consider rent according to key categorical variables - ideally the same categorical variables that you examine in your rental model.
For the rental type analysis, a nice looking contingency table followed by an interpretation of your results would be sufficient. You may want to recode the
Bedrooms
variable to be more sensible. Be thoughtful with your reasoning here.For the rental price model, you should limit your focus to one response variable and maximum 3-4 predictor variables that you think are most important in your analyses. You can consider multiple, related models if you wish - models that share most predictor variables but differ by one or two secondary variables.
Whether a variable should be included in your model or not should be based on whether the variable is important in evaluating the client demand. You should cite a source or two so as to better help you develop some expectations and select the variables that you include in your analysis.
Pay attention to the levels of your categorical variables - consider recoding the variables to collapse categories that are essentially the same or to eliminate categories that have very small numbers of observations (if, in your judgement, it does not substantively affect your analysis.)
For your regression model, do not exclude a variable just because it initially does not meet the regression requirements if it is substantively important. However, consider carefully whether some variables are actually highly related to another predictor variable – do not include several measures of the same basic concept (collinearity). In general, you want to start by picking the variables you think matter the most in determining popularity and then work to try to understand their relationship to popularity.
You should focus your graphs and tables on that illustrating the most important information for drawing your conclusion. Choose your tables carefully such that they convey the key information needed to arrive at your conclusion. Do not make tables and graphs of irrelevant information or points that do not need discussing. Multiple graphs of the same type (for example, distribution plots) can be combined using the
gridExtra
package, but unrelated plots should be listed separately.Make sure to also interpret the coefficients. You need to interpret the impact of a one unit change in the predictor variable on the response variable. You additionally need to examine whether changes in the predictor variables lead to a substantively large or small change in the response variable (
Price
). One way to do this is examining whether changing the predictor variable from its Q1 to Q3 value leads to a large or small change in the response variable. You may want to make a table with this information.
Introduction
Discuss what your expectations are here and how you arrived at your expectation, including the source of your expectation.
Summary data
Rent distribution
Consider and interpret how rent is distributed, examine outliers, and consider rental price distribution by key categorical variables
Rental type analysis
Discuss and interpret your findings related to the most common rental type
Rental price model
In this section you will fully develop a model of the rental price of an apartment. You will also want to examine any key two-way relationships with a scatterplot.
Regression model
State clearly what your proposed model is and why you selected the relevant predictor variables.
Regression diagnostics
Check to see how well your model fits the data here.
Interpreting coefficients
Interpret the magnitude of your coefficients here.
Sample rentals
Examine 3 different rental properties in detail. Calculate the predicted value for each rental according to your model and the residual. Consider reasons why the rental did not fit the model.
Conclusion
Add conclusion and recommendations here. Note also any limitations of your analysis and what other forms of advanced analysis could be conducted to strengthen the analysis in the future.